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Abstract. This paper aims at the problem of link pattern prediction in 
collections of objects connected by multiple relation types, where each 
type may play a distinct role. While common link analysis models are 
limited to single-type link prediction, we attempt here to capture the 
correlations among different relation types and reveal the impact of vari- 
ous relation types on performance quality. For that, we define the overall 
relations between object pairs as a link pattern which consists in in- 
teraction pattern and connection structure in the network, and then use 
tensor formalization to jointly model and predict the link patterns, which 
we refer to as Link Pattern Prediction (LPP) problem. To address the 
issue, we propose a Probabilistic Latent Tensor Factorization (PLTF) 
model by introducing another latent factor for multiple relation types 
and furnish the Hierarchical Bayesian treatment of the proposed proba- 
bilistic model to avoid overfitting for solving the LPP problem. To learn 
the proposed model we develop an efficient Markov Chain Monte Carlo 
sampling method. Extensive experiments are conducted on several real 
world datasets and demonstrate significant improvements over several 
existing state-of-the-art methods. 

1 Introduction 

Modeling relational data has been an active area of research in recent years, 
and is becoming an increasingly important problem in many applications such 
as social network analysis and recommender systems. Link prediction [1] as one 
basic challenge is concerned with predicting unobserved links between object 
pairs based on the observed structure in the network. A typical example is a 
social network where people are linked via explicit relations, such as friendship 
or membership; or implicit ones like the sharing of similar interests. Up to now, 
most of the related models developed for link prediction either consider only 
single-type relations among objects or treat the different relations in the network 
homogeneously [2] [3] , thus ignoring the multi-dimensional nature of interactions 
and the potential complexity of the interaction schemes in the networks. 

In this paper, we focus on the task of predicting multiple relation types among 
object pairs in multi-relational networks. For that, we define the overall relations 
between each pair of objects as a link pattern, which consists in interaction 




Fig. 1. Example of link pattern prediction task. There are four relation types (Alumni 
relations, Colleagues, Gym membership and Common interest in the social network. 
The sets of "?" represent unknown link patterns. 



pattern and connection structure among objects. This task is illustrated in Figure 
1. The left part of Figure 1 shows a social network composed of set of individuals 
with multiple relations among them, where the link patterns involving multiple 
relations between certain object pairs are unobserved (indicated by "?" in the 
figure) . The task here is to infer the missing link patterns from the observed part 
of the network, which we refer to as Link Pattern Prediction (LPP) problem. 
With the extracted link patterns, the fine and subtle network structure in the 
multi-relational networks can be captured effectively, and can be used to improve 
the range and performance of various aspects of social network applications, 
including community detection and person recommendation with different social 
roles. 

Therefore, in the context of Link Pattern Prediction problem, we propose a 
probabilistic tensor factorization framework to model the multi-relational data 
by considering the tensor factorization as a latent factor model. Our model 
addresses two major challenges that are ignored by previous work on link pre- 
diction. The first challenge is the multi-relational nature of networked data. In 
addition to using latent factors to characterize object features, we also introduce 
another latent factor for different relations to capture the correlations among 
multiple relation types and reveal the impact of distinct relation types on per- 
formance quality. The second challenge is data sparsity problem. For example, 
the social networks are usually very sparse, and the presence of relations among 
users only hold a very small number of all possible pairs of nodes. To solve the 
overfitting problem caused by the sparse data we extend our probabilistic model 
by employing Bayesian learning method to infer the latent factors. The Bayesian 
treatment can significantly avoid overfitting by placing prior information on the 
model parameters and handling the missing data easily. 

Moreover, we deal with the parameter learning by an efficient Markov Chain 
Monte Carlo (MCMC) method in the real world datasets. We conduct ex- 
periments on several real world multi-relational datasets, the empirical results 



demonstrate the improvement of prediction accuracy and the effectiveness of our 
models. 

The rest of the paper is structured as follows. We first briefly review related 
work in Section 2. Then we introduce our link pattern prediction task and for- 
mulate the probabilistic latent tensor factorization model for solving the LPP 
problem in Section 3. We also provide a fully Hierarchical Bayesian treatment 
to optimize the probabilistic model and derive an efficient Markov Chain Monte 
Carlo optimization method in Section 4. We describe experiments on three real- 
world datasets to study the efficiency of our model and compare it to several 
models in Section 5. In Section 6 we present conclusions and future work. 

2 Related Work 

Previous link prediction models are entirely based on structural properties of 
the observed network. [3] compares many predictors based on different graph 
proximity measures. Then the latent factor models have received more and more 
attention in recent years. [4] propose the stochastic relational models for the 
link prediction problem, which are essentially the Gaussian process models. [5] 
extend the matrix factorization model to the probabilistic framework for collab- 
orative filtering tasks. Another interesting direction on link prediction concerns 
the prediction of the relationship strength [6]. However, all the related work 
above in link prediction literature focus only on the single link prediction task. 

The problem of link prediction in multi-relational networks has only been 
addressed very recently, e.g. [7] [8]. Our work is related to the multi-relational 
learning problem, where several relations are jointly modeled. Different strate- 
gies have been developed to enable parameter sharing when jointly factorize a 
collection of related matrices. For example, [9] introduced the nonparametric 
latent feature relational model which infers only the latent features of each en- 
tity. [10] proposed a statistical model with latent matrix representations of the 
objects and the regression term to predict the missing links in the network data 
without considering relation types. However, our proposed model additionally 
introduce the latent factor for multiple relation types to capture the correlations 
and explore the impact of distinct relation types on prediction performance. 

Tensor factorization has also attracted a lot of attention in the data mining 
community and have been used in many applications, such as web link analysis 
[11] [2], analysis of email communications [12], and for personalized tag recom- 
mendations [13] or collaborative filtering over time [14]. There are also other 
tensor factorization models [15] [16] which are not suitable for our proposed 
LPP problem. 

3 Probabilistic Latent Tensor Factorization Model for 
Link Pattern Prediction 

In this section, we first present problem definition for the link pattern prediction 
task, then propose a latent tensor factorization framework to model the multi- 




Common-interest relation 



- Colleagues relation 
Membership relation 



Alumni relation 



Fig. 2. Example of modeling the multi-relational social network as a tensor for predict- 
ing the unobserved link patterns in the network. To the right is the tensor representation 
of network where each slice matrix represents one relation type and each tube fiber 
represents the link pattern between two nodes. Unknown link patterns are represented 
as "?" fibers. 



relational data and develop a maximum a posteriori (MAP) method to infer the 
latent factors. 



3.1 Problem Definition 

Suppose we are modeling pair-wise relations between objects. Formally, let X = 
{x%, X2, ■ • -xn} represents a set of N objects, assuming there are T different 
types of relations among object pairs defined by a set of T adjacency matrices. 
Based on the multi-dimensional nature of tensor representation, we model the 
multi-relational data in the network by an NxNx T third-order tensor y, where 
an entry y^t representing the t-type relation value between object pair (xi,Xj) 
can be defined as follows: Vi, j € [1..7V] 2 ,Vt G [1..T], 



We also define another Nx Nx T third-order tensor X which serves as the 
indicator tensor that is equal to 1 if the link is observed and equal to when 
the link status is missing for the pair of objects. 

Then, we define yujA as the link pattern involving T different types of rela- 
tions between each pair of objects Xi and Xj, which is naturally represented by 
a tube fiber in the tensor model. Given some observed link patterns information 
for the related objects in the multi-relational network, we are interested in the 
task of predicting the unobserved link patterns for the remaining object pairs 
as illustrated in Figure 2, which we refer to as Link Pattern Prediction (LPP) 
problem. 




1 if a t-type relation exists for Xi and Xj ; 
if no t-type relation exists for Xi and Xj ; 
? if the relation for Xi and Xj is unknown. 



3.2 Probabilistic Latent Tensor Factorization Model 



It has been shown that missing links can be inferred by the inner product of 
corresponding latent features based on matrix factorization [5] [9]. However, in 
LPP problem the link patterns generate multi-dimensional data which are diffi- 
cult to handle by matrix factorization. Motivated by this observation, we extend 
the Probabilistic Matrix Factorization [5] to the tensor factorization version and 
propose a Probabilistic Latent Tensor Factorization (PLTF) model to explore 
the latent factor matrices and discover the unobserved link pattern information. 

Assuming the proposed PLTF model assigns different latent factor matrices 
for the pair of related objects, denoted as U € R NxD , V € R NxD , and a specific 
latent feature factor for the relation types, denoted as R £ M. TxD , then the 
probability of having a relation between objects can be learned via the three- 
order tensor factorization on data y. 

In the work we consider CANDECOMP/PARAFAC(CP) tensor factorization 
[17] [18] for the data modeling, and introduce the generalized tensor factorization 
model which can be written as follows: 

D 

yi,j, t ~ f(%2 Ui, d Vj,dRt, d + e) (i) 

d=l 

where E is the tensor form of noise term. 

By assuming the noise term E be Gaussian, the observed data y 1 in the 
generalized tensor factorization model would follow a multivariate Gaussian dis- 
tribution as follows: 

N M T D 

P (y\U, V,R,a)=HH ]Jl lJt [M(y iJt \(J2 UijVjjRt^a- 1 )] (2) 

i=l j'=l t=l d=l 

where Af(-) denotes the multivariate Gaussian distribution with precision a and 
mean (J2d=i Ui.dVj.dRt.d) obtained by the CP tensor factorization. 

Following the process on predicted data in [5] , we also consider to incorporate 
the logistic function g(x) = jq^^y, which can be used to map the product of 
latent factors U, V and R for missing link pattern prediction results into the 
range [0,1]. Then we can obtain the transformed predictive distribution rewritten 
as follows: 

N M T D 

p(y\U,V,R,a) = ]Jl[l[l lJt [Af(y iJt \g(J2 U ^dRtd),a- 1 )] (3) 

i=ij=it=i d=i 

Note that by the transformation with the sigmoid function, we can somewhat 
improve the prediction performance. 

1 For convenience, we just use y representing the observed part of data y^t 



3.3 Basic Model with Non-informative Priors 



Considering the observed data in the tensor model is very sparse, we take 
the usual Bayesian model by imposing zero-mean and independent multivari- 
ate Gaussian prior distributions on the latent feature factor matrices U, V for 
object pairs as follows: 

N 

p(U\a u ) = l[M(U i \0,a^I) (4) 

i=l 
M 

p(V\a v ) = l[M(V j | 0,(^1) (5) 
j=i 

where I is the D-by-D identity matrix. 

In terms of the latent feature factor for relation types, we believe that in 
the contexts of different relation types the pair of objects will not have the 
same interaction pattern, which means they have independent feature vectors 
in the latent factor R. Therefore, we further place the similar Gaussian prior 
distribution on the latent factor R for relation types as follows: 

T 

p(R\a R ) = l[M(R t \0,a^I) (6) 

t=i 

Based on the prior distributions placed on the latent factors in the model, 
we can get the predictive distribution for the unobserved link patterns 3^-.) 
conditioned on observed data y as: 

p(y*\y) = J P (y*\U,V,R,a)p(U,V,R,a\y)d{U,V,R,a,a u ,a v ,a R } (7) 

Considering the above model specification for the directed relations, we have 
an interpretation for modeling the probability of existing link pattern between 
object pairs: the link patterns involving multiple relation types among object 
pairs depend not only on how similar the sender-specific and receiver-specific 
latent feature factors U and V are, but also on how likely the corresponding latent 
features of object pairs match with the "context" latent factor R representing 
different relation types. For example, in the same company one who likes sports 
may connect to the colleagues who are fellow gym-goers, and be less likely to 
interact with the alumnus that love shopping, which means the characteristics 
of people influence the probability of the link pattern in a number of different 
spheres of interaction or relation types. 

3.4 Parameter Estimation 



Considering the distribution in Equation (7), note that the predictive distribu- 
tion is averaged over the posterior distribution p(U, V, R, a\y). We can infer the 



model parameters {U, V, R, a} by maximizing the log-posterior distribution over 
them as following: 

lnp(U, V, R, a\y) oc lnp(y\U, V, R, a) + lnp(U\ au ) + ]np{V\a v ) + lnp(i?| MT , a v ) + C 

(8) 

where C is a constant. 

With the fixed values of hyperparameters {a, a\j, a v , a R } 7 maximizing the 
posterior conditional distribution over the model parameters {U, V, R, a} is equiv- 
alent to minimizing the following regularized weighted error function [17]: 

N N T D 2 

£ =2EEE^ (y& ( E u id v jd R td ) ) 
i=i j=i t=i d=i 

N N T 

+f Eii^ii^+f £h^+?£ii R t ill 

i=i j=i t=i 

where — au/a, -fv = oty/ot, 7^ = apja, || • \\p denotes the Frobenius 
norm. For optimizing the latent factors in the error function, we can use the 
Polak-Ribiere variant of non-linear conjugate gradient based method [19] to find 
a local optimal solution for {U, V,R, a}. The inference procedure is outlined in 
Algorithm 1, more details can be referred to [17]. 



(9) 



Algorithm 1 Conjugate Gradient method for PLTF model 
Initialize latent factor parameters {Uo,Vo,Ro} 
repeat 

for i — 1 to N do 

Ut^Ui + a<fiA u --§§ i ) 
end for 

for j = 1 to N do 

V* ^V j+a ([3A v 
end for 

for t = 1 to T do 

R* t ^ Rt + a{PA R - §§- t ) 

end for 
until stopping criterion is met 
Return {U*,V*,R*} 



Limitations of MAP Estimation After obtaining the estimates for latent 
factors {U* , V* , R*}, we may predict the missing link patterns, which means si- 
multaneously inferring multiple relation types between object pairs by Equation 
(7). However, there are still limitations on the results of MAP estimation. First, 
since the observed data is sparse and the MAP estimation chooses a single point 
{U* , V* , R*}, there is no model averaging which may lead to higher variance 



in final prediction, and point estimators also ignore uncertainty in the model 
parameters {U, V, R}. 

Another limitation with the basic model is about manually tuning the values 
of the parameters {a, au, ay, an, fir}- We can consider a set of appropriate prior 
values to learn the model parameters, and select the best ones based on cross- 
validation. However, this approach is infeasible and computationally expensive. 
Therefore, in the next section we will introduce a fully Bayesian treatment to 
the proposed PLTF model to avoid its drawback in model parameter learning. 

4 Hierarchical Bayesian Treatment for PLTF Model 

The aforementioned inference procedure for learning latent factors by MAP esti- 
mation always leads to overfitting when the model parameters are not properly 
selected, particularly on large-scale and sparse datasets. Thus we develop an al- 
ternative solution which employs the hierarchical Bayesian learning for the PLTF 
model. Based on the Bayesian treatment, we can make the posterior distribu- 
tion integrating out all model parameters and hyperparameters, which makes 
the predictive distribution less likely overfits the observed data and generalizes 
well on the missing data. 

4.1 Hierarchical Bayesian Model 

As in the PLTF model, the observed data are modeled using a multivariate 
Gaussian likelihood given by the latent tensor factorization in Equation (2), 
then the probability of the observed data can be generated from the following 
generative process: 

D 

y ijt \U, V,R,a~Af((J2 U ld V ]d R td ),a-^ (10) 
d=i 

With the consideration of hierarchical Bayesian generative model, we select 
a conjugate prior on the precision a, namely Gamma distribution with shape 
and scale parameters I and 9 as follows: 

6 l -x 

p(a\l, 6) = Gamma((, 6) = j^ai^ ie xp( — ) (11) 

Moreover, we choose the prior distributions for the latent factors of objects 
and relation types as multivariate Gaussian distributions with mean vector \i 
and precision matrix A (inverse of the covariance matrix): 

N 

P (U I pu, Au) = J] I W> A u X ) ( 12 ) 

i=l 
N 

p{V\^ v ,A v ) = J{M{V j \ii v ,Ay l ) (13) 



Wo, yo Ho, Pt Wo, yo Wo, yo 




Fig. 3. Graphical representation for the Hierarchical Bayesian treatment of Probabilis- 
tic Latent Tensor Factorization model. Shaded nodes {U, V, R, a} indicate the latent 
factor parameters, 3^ijt and y*j t denote the probability of observed relation and the 
missing one. Hollow nodes denote the parameters {nu, Mv, ^r, Ajj, Ay , Ah} and hy- 
perparameters {1, 8, /m>, kt, «o, Wo, ^o}- Weight indicator is elided. 



T 

p(R\fi R ,A R ) = l[M(R t (14) 
t=i 

Thus, the above prior distributions on the latent factors {U, V,R, a} are 
conjugate to the Gaussian likelihood in the Equation (10). Then we still need 
to select the prior distributions for the parameters {/i, A} of the latent fac- 
tors. Considering to set the multivariate Gaussian parameters and to facilitate 
subsequent learning procedure, we choose the appropriate conjugate priors as 
Gaussian- Wishart distributions for the means /x and precision matrices A: 

p(jiu,Au) = p(^u\Au)p(Au) = N(liu\vo, (KoAuy^WiAulWo^o) (15) 
p(^ v ,A v )^p(fi v \A v )p(A v )=Af(^v\no,(noA v y 1 )W{A v \W (h iyo) (16) 

p(jiR,A R )=p(n R \A a )p(A R ) = AA(^,|/i ,(K T yl K )- 1 )W(yl fl |H/ ,^o) (17) 

where W is the Wishart distribution of a D x D random matrix A with a scale 
matrix Wq and degrees of freedom i/q. 

Figure 3 contains a graphical representation of our proposed Hierarchical 
Bayesian model. The predictive distribution of the missing link patterns over all 
the model parameters and hyperparameters conditioned on the observed data 
can be obtained as follows: 

P (y* \y) = J p(y*\u, v, r, a) P (u, v, r, a \n, a, y) P (p, \\&)d{u, v, r, a, M , a> 

(18) 



where \x denotes {fiu, Hv, Hr}, A denotes {Au, Ay, Ar}, denotes the hyper- 
parameters {k,6,Ho,KT,Ko, Wo,vq} for the conjugate prior distributions which 
are specified in the Bayesian learning. 

Based on the hierarchical Bayesian treatment, the modified PLTF model can 
marginalize over all model parameters and hyperparameters, and make more 
efficient performance. Moreover, Bayesian learning can tune the values of hyper- 
parameters in a reasonably set of model space, which has little change on the 
stable performance of our model. We refer to the resulting model as Hierarchical 
Bayesian Probabilistic Latent Tensor Factorization (HB-PLTF) model, and we 
will show an efficient inference procedure for estimating model parameters and 
hyperparameters in the next subsection. 

4.2 Inference with Markov Chain Monte Carlo 

The exact solution to the predictive prediction in Equation (18) is analytically 
intractable due to the difficulty of computing the posterior distribution. We thus 
resort to the approximate inference. In this paper, we employ the sampling-based 
method, i.e., Markov Chain Monte Carlo (MCMC) [5] [16] technique. 

From the MCMC sampling-based inference view, we can draw the approxi- 
mation to the predictive distribution of Equation (18) as follows: 



where the prediction can be approximated by the expectation of p(y* \U^ k \ V^ k \R^ k \ 
by a sequence of samples {U^ k \ V^ k \ R^ k \ a^} drawn from a Markov chain 
whose proposal distribution is p(U,V, R,a\y) that denotes the posterior distri- 
bution over the model parameters. 

In our work, we use Gibbs sampling, one of the most widely used MCMC 
techniques, by which we can draw the conditional distributions on latent vari- 
ables that have a parametric form easily to be sampled from [20]. During the 
procedure of Gibbs sampling, the latent variables of model parameters and hy- 
perparameters are partitioned into several blocks, and each block of latent vari- 
ables is sampled iteratively given some initial value of the parameters while all 
the others are fixed until convergence. 

To apply the Gibbs sampling method in our Bayesian probabilistic model we 
need to derive the appropriate conditional distributions based on the predictive 
distribution of Equation (18). The use of conjugate priors for the model param- 
eters and hyperparameters makes the posterior conditional distributions easy to 
sample from, which leads to an efficient sampling procedure. Here according to 
the Bayesian rules and Equation (18) we can derive the posterior distribution 
conditioned on the observed data as follows: 

p(U, V, R, a\y) cx p(y\U, V, R, a)p(a\k, 6)p(U\nu, Au)p(V\^ v , A v ) 
p(R\liR, A R )p(nu, A u )p(nv,A v )p(^ R , A R ) 

The whole Gibbs sampling procedure is illustrated in Algorithm 2. 




(19) 



Algorithm 2 Gibbs sampling for Bayesian PLTF model 



Initialize latent factor parameters {Uo, Vo, Ro}, Qo 
for k = 1 to K do 

Sample the model parameter and hyperparameters based on Equation 
(23), (24), (25), respectively: 

Q (fe) * ^sample from Gam-mail*, 6* \U (k) , V (k) , R (k) , y) 

{fi (k) ,A^} «- sample from p(n (k) , A (k) \U (k) ) 

{l$\AW} «- sample from p(^ k) , A (k) \V^) 

{/*« 5 . A r ] } «~ sample from p(fi (k) , A (k) \ flW ) 
for i = 1 to TV do 

end for 

for j = 1 to N do 

vf «- pW\ut k+ v\RW', a w',y,i$\AP) 

end for 

for t = 1 to T do 

«- p(^|[/( fc+1 )*,l/(' : + 1 )*,a«*,^,^ ) ,4 fe) ) 

end for 
end for 

Return {U* , V* , R* , a* , //, A} 



Posterior Distributions of Latent Factors To learning the latent feature 
factor U* , we can derive the posterior conditional distribution of the correspond- 
ing samples which follows the multivariate Gaussian distribution conditioned on 
the other latent factors and the observed data as follows: 

N 

p(U*\V*,R*,y,» u ,A u ) = l[Ar(U*\ri,(A*)- 1 ) (21) 
with the posterior precision matrix and the mean as: 

N T 

A* =A u + a £ E 2 ^* ' R t) T m ' R t)) 
i=i t=i 

JV T 

Mi = (^r 1 + a E E z ^* • 

j=l t=l 

where the symbol "•" denotes the element- wise product of two vectors. Similarly 
the posterior conditional distribution for the latent factor V* can be drawn in 
the same manner and have the similar form as for U*. 

As for the latent factor R* of relation types, we assume it is influenced by 
the interaction between two latent feature factors U* and V* with respect to 
the object features, and then we still consider its posterior conditional distribu- 
tion with the same parametric form as its prior distribution which follows the 



multivariate Gaussian distribution: 

T 

p(R*\U* , V* , y, hr, A R ) = J] N(R* t \£ , (22) 

t=i 

where 

JV AT 

»=i j=i 

JV JV 
i=l j=l 

Posterior Distributions of Model Parameters From the posterior distribu- 
tion form with the aforementioned conjugate priors, we can derive the posterior 
conditional distribution for samples {a*} which follow the Gamma distribution 
as: 

p(a*\l*,6*) = Gamma(l*,6*) (23) 

where 

JV N T 



^^EEE^ 

i=i j=i t=i 

JV JV T D _j 

e * = i^ 1 + 2 EEE X «<(^ - (E^^)) 2 )" 
i=i j=i t=i d=i 

Note that for obtaining the conditional distributions over the latent factors, 
we need to derive the posterior conditional sampling for the model hyperparam- 
eters, i.e., the means {/x} and the precision matrices {A} simultaneously. 

Considering the conjugate prior distribution of {/j-u, Ajj} in Equation (15), we 
have the conditional distribution with the form of Gaussian- Wishart distribution 
as follows: 

p(jiu,Au\V) =N{ i i u \^{nlA u r 1 )W{A u \W^u*) (24) 



with 



= , „ > K o = K o + N, u =v + N; 

Kq + iV 



TO" 1 = W^ 1 +C+ -^(U - n )(U mo) 



JV 



C = YJP i -U){U i -Uf 



i=l 



where U = 5Z i=1 U% is the sample mean. Similarly, the conditional distribution 
for {^iy, Ay} has the same form. 



As for the hyperparamters {fi R ,A R } for the latent feature factor of rela- 
tion types, we can still derive the similar parametric form of Gaussian- Wishart 
distribution as follows: 



p( t i R ,A R \K)=M( f x R \^,( K * T A R )- 1 )W(A R \W^u* ) (25) 

with a little difference on k* t = kt + T, the mean /j,q and the precision matrix 
Wq have the forms respectively as: 

Mo = KT ^H R , »l =»o + T,S = ][> - R)(R t - Rf; 

K T + 1 t=1 

, C , KtT fu .. \lu .. \T 



TO" 1 = + S+ -^-^(R - »o)(R - Mo) 



where R — ^ Ym=i R-t denotes the sample mean. 



4.3 Computational Complexity Analysis 

In this subsection, we discuss the computational complexity of our proposed 
algorithms in the implementation. For per iteration the proposed non-Bayesian 
PLTF model and the Hierarchical Bayesian model both require 0(2ND 3 +TD 3 + 
\y\D 2 ) computational time, where 1 denotes the number of observed relations 
in the learning phrase. For the choice of D, we can choose the values according 
to the tradeoff between the model complexity and the learning efficiency. For 
the hyperparameters, the proposed Hierarchical Bayesian model can eliminate 
the complexity of manual adjustment by introducing the prior distributions for 
them. For the Gibbs sampling procedure, since the convergence of sampling 
usually takes a long time, the MAP results from PLTF model can be used to 
initialize the Gibbs sampling. Moreover, we can stop sampling when the accuracy 
archives the desirable level in the experiments. 



5 Experiments 

We evaluate the performance of our proposed models on discovering missing link 
patterns on three real-world multi-relational datasets and make some discussions 
about the results later. 



5.1 Datasets 

In the experiments, we examine how our proposed models behave on real-world 
multi-relational networks. Three datasets are collected: Kinship relational dataset, 
Country relational dataset and YouTube social dataset. 



— Kinship relational dataset. The Kinship relational dataset consists of 
kinship relationships among the members of the ALyawarra tribe [21]. We 
extract 26 relation types among 104 people in the dataset (such as "father" 
or "wife" relations), and then build the multi-relational network. For evalu- 
ation, we construct the tensor model of size 104x 104x 26. 

— Country relational dataset. The Country relational dataset consists of 
international relations among different countries in the world [22]. We extract 
56 relation types among 14 countries in the dataset (such as "emigrants" or 
"exports"), and then build the multi- relational network for this dataset. For 
evaluation, we construct the tensor model of size l^xi^x 56. 

— Youtube social dataset. YouTube is currently the most popular video 
sharing web site, which allows users to interact with each other in multiple 
relations such as contacts, subscriptions or sharing favorite videos 2 . we 
choose 3,000 active user profiles in the network, and construct the tensor 
model of size SOOOx 3000x 5 with five types of relations. 

5.2 Experimental Setup 

We apply our proposed PLTF model and the hierarchical Bayesian version for 
Link Pattern Prediction (LPP) problem. In addition, we test the Bayesian treat- 
ment of probabilistic latent factorization model by initializing the Gibbs sampler 
with either random latent factor matrices (denoted as "HB-rPLTF") or the la- 
tent factors obtained by training the PLTF model (denoted as "HB-tPLTF"). 
We also compare these methods with the other state-of-the-art methods. 

— LPP-LFRM model (Latent Feature Relational Model) [9]. This is a non- 
parametric latent feature relational model which infers a global set of latent 
binary features for each object as well as how those latent features interact 
in the multi-relational networks. 

— LPP-BPMF model (Bayesian Probabilistic Matrix Factorization) [5] . This is 
a probabilistic matrix factorization model applied separately to each relation 
slice y ::t of the original y tensor data. Each relation within this procedure 
is thus handled independently of the other relations in the network. We 
implement and report the performance of this model in the LPP problem. 
Comparing our PLTF model with this mono-relational method will allow to 
examine the benefit, if any, of multi-relational prediction. 

Each tube fiber in our PLTF tensor model can represent a link pattern be- 
tween two objects. For the experiments, we choose a given fraction (e.g., 20%) of 
the tube fibers as unobserved link patterns for the test data. For the Hierarchical 
Bayesian treatment of our proposed PLTF model, parameters are set according 
to prior knowledge without tuning: ^ = 0, ^ = D, W a = I, k = 5, 9 = 1, 
kq = 2, kt = 1. The algorithms are implemented in Matlab. Then, we use the 
AUC (score Area Under the receiver operating characteristic Curve), which is a 
robust measure for sparse dataset [2], as the evaluation metric to test the link 



2 http : I /www. public.asu.edu/ ltang9 





Kinship Dataset 


Countries Dataset 


HB-tPLTF 


0.9483 


0.9187 


HB-rPLTF 


0.9401 


0.9111 


PLTF 


0.9269 


0.8994 


LPP-LFRM 


0.9183 


0.8772 


LPP-BPMF 


0.8022 


0.7827 



Table 1. Average AUC Performances with different methods on Kinship and Countries 
datasets. 



pattern prediction performance. For the experiments we evaluate the methods 
by repeating the process five times and report the average results. 

5.3 Experimental Results 

We first compare our proposed PLTF model and its Bayesian variants to the 
state-of-the-art model LFRM and the mono-relational model LPP-BPMF. Table 
1 reports the average AUC performances of these models on the Kinship and 
Countries datasets. For the PLTF model, we can learn the model by maximizing 
the posterior distribution, and set the regularization terms 7^ = = "/r = 
0.01. For the HB-rPLTF model and HB-tPLTF model we generate 300 samples 
in sampling when the results stabilizes as well as in LPP-BPMF model. 

From Table 1, the best performing method among all the models are the HB- 
PLTF variants. And the multi-relational models (HB-tPLTF, HB-rPLTF, PLTF 
and LPP-LFRM) consistently have better results than the LPP-BPMF mono- 
relational method in our prediction task on the two multi-relational datasets. 
This seems to show that multi-relational prediction methods allow capturing the 
correlations among multiple relation types so as to improve the accuracy of link 
pattern prediction. In contrast, the LPP-BPMF based method only processes 
the target relation type separately without considering the additional cross link 
pattern information. 

Another observation is that our proposed PLTF model and its Bayesian vari- 
ants perform better than the LFRM model. The reason is that the proposed 
PLTF models can more efficiently explore the impact of different relations by 
introducing the latent factor for multiple relation types and capturing the in- 
teractions between three latent factors of objects and relation types, while the 
LFRM model just infers a global set of binary latent feature matrices for all 
relations and only considers the latent feature factor of objects. 

Table 1 also shows the performance comparison between PLTF model and its 
Hierarchical Bayesian versions. We can observe that HB-PLTF versions outper- 
form the non-Bayesian PLTF model, which indicates the prediction performance 
can be enhanced by integrating out model parameters and hyperparameters and 
by the efficient procedure of Gibbs sampling in the model space. However, the 
difference between the Hierarchical Bayesian versions with the random initializa- 
tion and with the PLTF initialization is inconspicuous which shows the stability 
of the Bayesian version of PLTF model. 





20% 


40% 


60% 


HB-tPLTF 


0.9101 


0.8348 


0.8107 


HB-rPLTF 


0.9067 


0.8210 


0.8001 


PLTF 


0.8740 


0.7999 


0.7512 


LPP-BPMF 


0.7202 


0.6575 


0.6101 



Table 2. Average AUC Performances with varying percentages of missing link patterns 
on YouTube dataset 
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Fig. 4. The impact of latent tensor factorization dimensions on YouTube dataset 

Next, we compare the prediction quality of our proposed PLTF models to the 
LPP-BPMF mono-relational model on YouTube dataset, which is a large-scale 
and sparse dataset. Results about average AUC performance for those models 
with varying percentages (e.g. 20%, 40%, and 60%) of missing link patterns are 
indicated in Table 2. As we can see, HB-tPLTF provides the best prediction 
quality among all the methods. Moreover, the variations of three PLTF mod- 
els clearly outperforms the LPP-BPMF mono-relational model, the results also 
confirm the ability of our PLTF models to deal with the multi-relational data 
and to capture the correlations among multiple relations, which can be used to 
improve the accuracy of link pattern prediction. 

Impact of Latent Factorization Dimensions Parameter D implies the num- 
ber of latent tensor factorization dimensions which also determines both the 
number of model parameters and model complexity. We conduct experiments 
on the three datasets to examine how the predictive performance is affected by 
D. 

Figure 4 shows the impact of the latent factorization dimensions on the per- 
formance of our proposed PLTF model and its Hierarchical Bayesian variations 
on the YouTube dataset. Frow Figure 4 we can observe that with the increas- 
ing number of factorization dimensions the prediction performance of the PLTF 
model does not improve and sometimes even becomes overfitting, while the per- 
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Fig. 5. Impact of various relation types on the prediction performance of PLTF model 
by incorporating each relation used in the training set compared with the normal result 
on YouTube dataset 



formance of HB-tPLTF model nearly remain stable. The results again indicate 
the advantage of Bayesian treatment for the probabilistic model. Considering the 
tradeoff between the model complexity and the learning efficiency We thus select 
the optimal number of factorization dimension for YouTube dataset is D = 20. 

We omit the detailed results for the other datasets as the similar trend of 
choosing model dimensions can be observed. For the Kinship dataset we can infer 
the optimal number factorization dimensions is D = 11, and for the Countries 
dataset D = 7. 



Impact of Various Relation Types In multi-relational networks objects con- 
nect with each other which follows some kind of interaction pattern within the 
context of certain specific relation type. Different relation types demonstrate dis- 
tinct interaction patterns in the network, which is captured by the link pattern 
containing multiple relation types. Here we try to exploit the effect of various 
relation types on the prediction performance, taking YouTube dataset as exam- 
ple. 

YouTube dataset contains five relation types among which Contact relation 
is the most sparse one while the other relational contexts are denser [23]. We 
consider the setting of incorporating some specific relation type used in the 
training set when the 20% fraction is selected as test data. The higher perfor- 
mance the result demonstrate, the more influence the specific relation type has 
on the link pattern prediction. Figure 5 shows the average prediction results. 
We can observe that the Favorite relation help boost the performance signifi- 
cantly while the Contact relation has the least impact on the results, which is 
consistent with our intuition. The results indicate our proposed model can not 
only capture the correlations among relation types but also reveal the influence 
of each relation type on the link pattern prediction performance. For example, 
people who have shared their favorite videos on YouTube are more likely to 



construct Contact relation and subscribe to each other while not vice versa. We 
can also rank the multiple relation types according to their impact on the pre- 
diction performance as follows: Favorite relation y Co-contact relation y 
Co-subscription relation y Co-subscribed relation y Contact relation. 

6 Conclusions and Future Work 

In this paper, we have proposed a new task of Link Pattern Prediction (LPP) 
problem and then developed a Probabilistic Latent Tensor Factorization (PLTF) 
model which represents social interaction patterns in multi-relational networks. 
For constructing the model, we introduce the specific latent factor for different 
relation types in addition to using latent factors to characterize object features. 
We also provide the Hierarchical Bayesian treatment of the probabilistic model 
to avoid overfitting for solving the LPP problem. For that, we derive an efficient 
Gibbs sampling method to learn the model parameters and hyperparameters. 
The experiments are conducted on several real world datasets and demonstrate 
significant improvements over several existing state-of-the-art methods and the 
ability to capture the correlations among different relation types, reveal the 
impact of distinct relation types in the multi-relational networks. 

There are several directions for future work that we will consider as the 
extensions of our proposed model. First, it would be interesting to investigate 
the evolutionary aspect of link patterns in the multi-relational networks over 
time. Second, we will consider some applications which can use link patterns 
to improve our understanding of social interaction and large-scale patterns of 
human association. 
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